40 research outputs found

    Continuous Defect Prediction: The Idea and a Related Dataset

    Get PDF
    We would like to present the idea of our Continuous Defect Prediction (CDP) research and a related dataset that we created and share. Our dataset is currently a set of more than 11 million data rows, representing files involved in Continuous Integration (CI) builds, that synthesize the results of CI builds with data we mine from software repositories. Our dataset embraces 1265 software projects, 30,022 distinct commit authors and several software process metrics that in earlier research appeared to be useful in software defect prediction. In this particular dataset we use TravisTorrent as the source of CI data. TravisTorrent synthesizes commit level information from the Travis CI server and GitHub open-source projects repositories. We extend this data to a file change level and calculate the software process metrics that may be used, for example, as features to predict risky software changes that could break the build if committed to a repository with CI enabled.Comment: Lech Madeyski and Marcin Kawalerowicz. "Continuous Defect Prediction: The Idea and a Related Dataset" In: 14th International Conference on Mining Software Repositories (MSR'17). Buenos Aires. 2017, pp. 515-518. doi: 10.1109/MSR.2017.46. URL: http://madeyski.e-informatyka.pl/download/MadeyskiKawalerowiczMSR17.pd

    Editorial

    Get PDF

    Defect prediction with bad smells in code

    Get PDF
    Background: Defect prediction in software can be highly beneficial for development projects, when prediction is highly effective and defect-prone areas are predicted correctly. One of the key elements to gain effective software defect prediction is proper selection of metrics used for dataset preparation. Objective: The purpose of this research is to verify, whether code smells metrics, collected using Microsoft CodeAnalysis tool, added to basic metric set, can improve defect prediction in industrial software development project. Results: We verified, if dataset extension by the code smells sourced metrics, change the effectiveness of the defect prediction by comparing prediction results for datasets with and without code smells-oriented metrics. In a result, we observed only small improvement of effectiveness of defect prediction when dataset extended with bad smells metrics was used: average accuracy value increased by 0.0091 and stayed within the margin of error. However, when only use of code smells based metrics were used for prediction (without basic set of metrics), such process resulted with surprisingly high accuracy (0.8249) and F-measure (0.8286) results. We also elaborated data anomalies and problems we observed when two different metric sources were used to prepare one, consistent set of data. Conclusion: Extending the dataset by the code smells sourced metric does not significantly improve the prediction effectiveness. Achieved result did not compensate effort needed to collect additional metrics. However, we observed that defect prediction based on the code smells only is still highly effective and can be used especially where other metrics hardly be used.Comment: Chapter 10 in Software Engineering: Improving Practice through Research (B. Hnatkowska and M. \'Smia{\l}ek, eds.), pp. 163-176, 201

    Editorial

    Get PDF

    Software Metrics in Boa Large-Scale Software Mining Infrastructure: Challenges and Solutions

    Get PDF
    In this paper, we describe our experience implementing some of classic software engineering metrics using Boa - a large-scale software repository mining platform - and its dedicated language. We also aim to take an advantage of the Boa infrastructure to propose new software metrics and to characterize open source projects by software metrics to provide reference values of software metrics based on large number of open source projects. Presented software metrics, well known and proposed in this paper, can be used to build large-scale software defect prediction models. Additionally, we present the obstacles we met while developing metrics, and our analysis can be used to improve Boa in its future releases. The implemented metrics can also be used as a foundation for more complex explorations of open source projects and serve as a guide how to implement software metrics using Boa as the source code of the metrics is freely available to support reproducible research.Comment: Chapter 8 of the book "Software Engineering: Improving Practice through Research" (B. Hnatkowska and M. \'Smia{\l}ek, eds.), pp. 131-146, 201
    corecore